AITopics | scene element

Collaborating Authors

scene element

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance

Le, Huy, Chung, Nhat, Kieu, Tung, Nguyen, Anh, Le, Ngan

arXiv.org Artificial IntelligenceJul-8-2025

Text-video retrieval (TVR) systems often suffer from visual-linguistic biases present in datasets, which cause pre-trained vision-language models to overlook key details. To address this, we propose BiMa, a novel framework designed to mitigate biases in both visual and textual representations. Our approach begins by generating scene elements that characterize each video by identifying relevant entities/objects and activities. For visual debiasing, we integrate these scene elements into the video embeddings, enhancing them to emphasize fine-grained and salient details. For textual debiasing, we introduce a mechanism to disentangle text features into content and bias components, enabling the model to focus on meaningful content while separately handling biased information. Extensive experiments and ablation studies across five major TVR benchmarks (i.e., MSR-VTT, MSVD, LSMDC, ActivityNet, and DiDeMo) demonstrate the competitive performance of BiMa. Additionally, the model's bias mitigation capability is consistently validated by its strong results on out-of-distribution retrieval tasks.

machine learning, natural language, scene element, (17 more...)

arXiv.org Artificial Intelligence

2506.03589

Country:

North America > United States (0.46)
Europe (0.28)

Genre: Research Report > New Finding (0.46)

Industry:

Health & Medicine (0.46)
Leisure & Entertainment (0.46)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Machine Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Text Processing (0.67)

Add feedback

Physically Plausible 3D Human-Scene Reconstruction from Monocular RGB Image using an Adversarial Learning Approach

Biswas, Sandika, Li, Kejie, Banerjee, Biplab, Chaudhuri, Subhasis, Rezatofighi, Hamid

arXiv.org Artificial IntelligenceJul-26-2023

Holistic 3D human-scene reconstruction is a crucial and emerging research area in robot perception. A key challenge in holistic 3D human-scene reconstruction is to generate a physically plausible 3D scene from a single monocular RGB image. The existing research mainly proposes optimization-based approaches for reconstructing the scene from a sequence of RGB frames with explicitly defined physical laws and constraints between different scene elements (humans and objects). However, it is hard to explicitly define and model every physical law in every scenario. This paper proposes using an implicit feature representation of the scene elements to distinguish a physically plausible alignment of humans and objects from an implausible one. We propose using a graph-based holistic representation with an encoded physical representation of the scene to analyze the human-object and object-object interactions within the scene. Using this graphical representation, we adversarially train our model to learn the feasible alignments of the scene elements from the training data itself without explicitly defining the laws and constraints between them. Unlike the existing inference-time optimization-based approaches, we use this adversarially trained model to produce a per-frame 3D reconstruction of the scene that abides by the physical laws and constraints. Our learning-based method achieves comparable 3D reconstruction quality to existing optimization-based holistic human-scene reconstruction methods and does not need inference time optimization. This makes it better suited when compared to existing methods, for potential use in robotic applications, such as robot navigation, etc.

artificial intelligence, machine learning, reconstruction, (16 more...)

arXiv.org Artificial Intelligence

2307.1457

Country:

Asia > Japan > Honshū > Chūbu > Ishikawa Prefecture > Kanazawa (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)

Genre: Research Report (0.51)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Robots > Humanoid Robots (0.83)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.68)

Add feedback

SyntEO: Synthetic Dataset Generation for Earth Observation with Deep Learning -- Demonstrated for Offshore Wind Farm Detection

Hoeser, Thorsten, Kuenzer, Claudia

arXiv.org Artificial IntelligenceDec-6-2021

With the emergence of deep learning in the last years, new opportunities arose in Earth observation research. Nevertheless, they also brought with them new challenges. The data-hungry training processes of deep learning models demand large, resource expensive, annotated datasets and partly replaced knowledge-driven approaches, so that model behaviour and the final prediction process became a black box. The proposed SyntEO approach enables Earth observation researchers to automatically generate large deep learning ready datasets and thus free up otherwise occupied resources. SyntEO does this by including expert knowledge in the data generation process in a highly structured manner. In this way, fully controllable experiment environments are set up, which support insights in the model training. Thus, SyntEO makes the learning process approachable and model behaviour interpretable, an important cornerstone for explainable machine learning. We demonstrate the SyntEO approach by predicting offshore wind farms in Sentinel-1 images on two of the worlds largest offshore wind energy production sites. The largest generated dataset has 90,000 training examples. A basic convolutional neural network for object detection, that is only trained on this synthetic data, confidently detects offshore wind farms by minimising false detections in challenging environments. In addition, four sequential datasets are generated, demonstrating how the SyntEO approach can precisely define the dataset structure and influence the training process. SyntEO is thus a hybrid approach that creates an interface between expert knowledge and data-driven image analysis.

dataset, ontology, synteo, (14 more...)

arXiv.org Artificial Intelligence

2112.02829

Country:

Europe > United Kingdom > England (0.14)
Europe > North Sea (0.05)
Atlantic Ocean > North Atlantic Ocean > North Sea (0.05)
(11 more...)

Genre: Research Report (0.82)

Industry: Energy > Renewable > Wind (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

A Knowledge-based Approach for the Automatic Construction of Skill Graphs for Online Monitoring

Jatzkowski, Inga, Menzel, Till, Maurer, Markus

arXiv.org Artificial IntelligenceFeb-15-2021

Automated vehicles need to be aware of the capabilities they currently possess. Skill graphs are directed acylic graphs in which a vehicle's capabilities and the dependencies between these capabilities are modeled. The skills a vehicle requires depend on the behaviors the vehicle has to perform and the operational design domain (ODD) of the vehicle. Skill graphs were originally proposed for online monitoring of the current capabilities of an automated vehicle. They have also been shown to be useful during other parts of the development process, e.g. system design, system verification. Skill graph construction is an iterative, expert-based, manual process with little to no guidelines. This process is, thus, prone to errors and inconsistencies especially regarding the propagation of changes in the vehicle's intended ODD into the skill graphs. In order to circumnavigate this problem, we propose to formalize expert knowledge regarding skill graph construction into a knowledge base and automate the construction process. Thus, all changes in the vehicle's ODD are reflected in the skill graphs automatically leading to a reduction in inconsistencies and errors in the constructed skill graphs.

graph, scene element, skill graph, (13 more...)

arXiv.org Artificial Intelligence

2102.08827

Country:

Europe > Germany > North Rhine-Westphalia > Upper Bavaria > Munich (0.05)
North America > Canada > Ontario > Waterloo Region > Waterloo (0.04)
Europe > France > Occitanie > Haute-Garonne > Toulouse (0.04)

Genre: Research Report (0.40)

Industry:

Automobiles & Trucks (0.69)
Transportation (0.47)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Expert Systems (0.72)
Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (0.71)

Add feedback

A Scenario-Based Development Framework for Autonomous Driving

Li, Xiaoyi

arXiv.org Artificial IntelligenceNov-5-2020

We systematically analyzed previous research works and proposed the definition of scenario, the elements of the scenario ontology, the data source of the scenario, the processing method of the scenario data, and scenario-based V-Model. Moreover, we summarized the automated test scenario construction method by random scenario generation and dangerous scenario generation.

scenario, scene element, vehicle, (14 more...)

arXiv.org Artificial Intelligence

2011.01439

Country:

Europe > Germany (0.04)
North America > United States > Michigan (0.04)
North America > United States > California > Alameda County > Berkeley (0.04)
(2 more...)

Genre: Research Report (0.64)

Industry:

Transportation > Ground > Road (1.00)
Automobiles & Trucks (1.00)

Technology:

Information Technology > Artificial Intelligence > Robots > Autonomous Vehicles (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

MONet: Unsupervised Scene Decomposition and Representation

Burgess, Christopher P., Matthey, Loic, Watters, Nicholas, Kabra, Rishabh, Higgins, Irina, Botvinick, Matt, Lerchner, Alexander

arXiv.org Machine LearningJan-22-2019

Realistic visual scenes contain rich structure, which humans effortlessly exploit to reason effectively and intelligently. In particular, object perception, the ability to perceive and represent individual objects, is considered a fundamental cognitive ability that allows us to understand - and efficiently interact with - the world as perceived through our senses [Johnson, 2018, Green and Quilty-Dunn, 2017]. However, despite recent breakthroughs in computer vision fuelled by advances in deep learning, learning to represent realistic visual scenes in terms of objects remains an open challenge for artificial systems. The impact and application of robust visual object decomposition would be far-reaching. Models such as graph-structured networks that rely on handcrafted object representations have recently achieved remarkable results in a wide range of research areas, including reinforcement learning, physical modeling, and multi-agent control [Battaglia et al., 2018, Wang et al., 2018, Hamrick et al., 2017, Hoshen, 2017]. The prospect of acquiring visual object representations through unsupervised learning could be invaluable for extending the generality and applicability of such models.

dataset, monet, representation, (16 more...)

arXiv.org Machine Learning

1901.1139

Country:

Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > United Kingdom > England > Greater London > London (0.04)

Genre: Research Report (0.64)

Industry: Health & Medicine (0.88)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)

Add feedback

A POMDP Model of Eye-Hand Coordination

Erez, Tom (Washington University in St. Louis) | Tramper, Julian J. (Radboud University) | Smart, William D (Washington University in St. Louis) | Gielen, Stan CAM (Radboud University)

AAAI ConferencesAug-4-2011

This paper presents a generative model of eye-hand coordination. We use numerical optimization to solve for the joint behavior of an eye and two hands, deriving a predicted motion pattern from first principles, without imposing heuristics. We model the planar scene as a POMDP with 17 continuous state dimensions. Belief-space optimization is facilitated by using a nominal-belief heuristic, whereby we assume (during planning) that the maximum likelihood observation is always obtained. Since a globally-optimal solution for such a high-dimensional domain is computationally intractable, we employ local optimization in the belief domain. By solving for a locally-optimal plan through belief space, we generate a motion pattern of mutual coordination between hands and eye: the eye's saccades disambiguate the scene in a task-relevant manner, and the hands' motions anticipate the eye's saccades. Finally, the model is validated through a behavioral experiment, in which human subjects perform the same eye-hand coordination task. We show how simulation is congruent with the experimental results.

artificial intelligence, machine learning, obstacle, (19 more...)

AAAI Conferences

Twenty-Fifth AAAI Conference on Artificial Intelligence

Country:

Europe > Netherlands > Gelderland > Nijmegen (0.04)
Asia > Middle East > Jordan (0.04)
North America > United States > Washington > King County > Seattle (0.04)
North America > United States > Missouri > St. Louis County > St. Louis (0.04)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Optimization (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Undirected Networks > Markov Models (1.00)

Add feedback

Filters

Collaborating Authors

scene element

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

4324e8d0d37b110ee1a4f1633ac52df5-Paper.pdf

BiMa: Towards Biases Mitigation for Text-Video Retrieval via Scene Element Guidance

Physically Plausible 3D Human-Scene Reconstruction from Monocular RGB Image using an Adversarial Learning Approach

SyntEO: Synthetic Dataset Generation for Earth Observation with Deep Learning -- Demonstrated for Offshore Wind Farm Detection

A Knowledge-based Approach for the Automatic Construction of Skill Graphs for Online Monitoring

A Scenario-Based Development Framework for Autonomous Driving

MONet: Unsupervised Scene Decomposition and Representation

A POMDP Model of Eye-Hand Coordination